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5-H , Abstract. Monte-Carlo simulations are used in the gravitational wave burst 

t^i detection community to demonstrate and compare the properties of different 

■^^ ' search techniques. We note that every Monte-Carlo simulation has a 

' corresponding optimal search technique according to both the non-Bayesian 

t""^ ' Neyman-Pearson criterion and the Bayesian approach, and that this optimal 

search technique is the Bayesian statistic. When practical, we recommend deriving 
the optimal statistic for a credible Monte-Carlo simulation, rather than testing 
^ , 1 ad hoc statistics against that simulation. 
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1. Introduction 

A number of different statistics [HElEllllElElIIlIHlElIiniin] have been proposed to 
detect bursts of gravitational waves with the global network of gravitational wave 
interferometers. With the exception of [31 21 [TT] these have been derived in a 
frequentist or ad hoc framework. The tool used by the community to demonstrate 
and compare their performance is the well-known Monte-Carlo simulation. The 
simulations are intuitive and easy to implement [12j . To perform a Monte-Carlo 
simulation requires generating many sample observations from two populations, one 
consisting of background noise only and the other background noise with gravitational 
wave signals added (or "injected", in community terminology). 

The author, with Sutton, Tinto and Woan, has elsewhere [TT] proposed a Bayesian 
statistic for the detection of gravitational wave bursts. Conventionally, the next step 
would be to test its performance using a Monte-Carlo simulation (and such a study 
is forthcoming) . Yet the notion of testing a Bayesian gravitational wave burst search 
with a Monte-Carlo simulation raises some novel issues. Adopting concrete models 
for noise and signal is required by both Monte-Carlo simulations and a Bayesian 
analysis, and moreover the Bayesian statistic so defined can be shown to be the 
best possible statistic for the corresponding Monte-Carlo simulation under the non- 
Bayesian Neyman-Pearson criterion. Performing a Monte-Carlo simulation will not 
determine if the corresponding Bayesian statistic performs better than other statistics; 
it can only quantify how much better its performance is. 

The relationship between Monte-Carlo simulations and Bayesian analyses is due 
to their common assertion that, despite our incomplete knowledge of the population 
of gravitational wave bursts, we can learn something by proceeding as if gravitational 
wave bursts had some particular distribution. In the Bayesian case, this distribution is 
the prior plausibility distribution for the signal model. In the Monte-Carlo case, this is 
the probability distribution we sample from to produce the signal population. In both 
cases, there is no right choice, but there are many wrong choices that contradict 
our physical knowledge, and ways to prefer some choices over others on grounds 
of including more physical knowledge. The credibility we attach to a Monte-Carlo 
simulation or a Bayesian analysis hinges on if we are satisfied that the signal (and noise) 
models involved adequately represents our state of knowledge about the universe. 

In i)2.1l we formally review Monte-Carlo simulations as typically conducted by the 
bursts community. In i i2.2l we note the Neyman-Pearson criterion, and its relationship 
to Monte-Carlo simulations. In i i2.3l we look at how statistics and signal models imply 
each other and what this means for papers that use two conflicting definitions. In 
H2A\ we discuss the equivalence between the Monte-Carlo simulation signal model 
and the Bayesian signal prior. In ^2.5\ we note some conditions that can restrict the 
applicability of our conclusions. 

2. Analysis 

2.1. Monte- Carlo simulations 

In the problem of gravitational wave burst detection, we wish to classify an observation 
X as either signal or noise. A deterministic rule will classify every possible observation 
as either signal or noise, partitioning the space of observations into two regions i?signai 
and i?,ioiso- The classification will be imperfect, because it is possible for any particular 
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observation to arise from signal or noise (for example, an observation could be due 
to a gravitational wave, or it could be due to a large noise excursion.) Misclassifying 
a signal as noise is a false dismissal; misclassifying noise as a signal is a false alarm. 
Different classification rules will partition the space of observations into different 
regions, and make different mistakes for different kinds of observation. To compare 
classification rules, we can perform a Monte-Carlo simulation. 

First consider the noise hypothesis i/noisc- To perform the Monte-Carlo 
simulation, we need many observations x corresponding to many different realizations 
of the noise. A simple Monte-Carlo simulation might create many white noise time- 
series, each with n time samples. In this case, we are sampling the distribution 

p{K\Hnoisc) = (27r)"^ exp(--x^x). (I) 

A realistic simulation might instead use previous observations of the gravitational 
wave observatory. In this case, we are still sampling a distribution p(x|_ffnoisc), but we 
do not have an explicit form for the distribution. 

We can compute the false alarm, probability pp (the fraction of noise realizations 
misclassified as signals) for the classification rule by evaluating the integral 



Pf = p(x|i7noiso)dx. (2) 

Note that we can only do this once we have chosen a distribution p(x|iJ,ioisc); it is not 
an intrinsic property of the classification rule. This is because we are measuring the 
portion of the specified distribution inside region -Rsignai- 

The integral evaluates a potentially implicitly defined distribution over a 
potentially implicitly defined region. The technique of Monte-Carlo integration with 
importance sampling fits exactly this situation J12j . Instead of sampling the space of 
observations uniformly, we sample it with distribution p(x|iJnoise)- The integral then 
becomes 



PF- n l.C"rr^' (;,(x|iI„o,e)dx) (3) 




1 if X e i?signal 

otherwise 

1 if Xi e i?signal 1 \ r.\ 

otherwise ( / ^ ' 

1=1 " 

for A'^ sample observations x^ drawn from p{x.\Haoisc)- This is how the intuitively 
obvious counting strategy, where we draw many samples and count how many lie in 
the region, is related to the formal definition of the false alarm fraction. 

Now consider the hypothesis ilgignai- To perform a Monte-Carlo simulation, we 
need many observations x^ drawn from a distribution p(x|iJsignai)- In the literature, 
this distribution is typically implicitly defined by a procedure like the one we now 
describe: 

A waveform h is drawn from some distribution p(h). Often the distribution is a 
set of a few discrete waveforms h.^ : 

-, Nh 

P(h) = ]^E'5(h-hO- (5) 

^ i—l 

An amplitude a for the injection is drawn from a distribution p(cr), again often from 
a few discrete choices: 

p{<y) = Trll^^^~^^)- (6) 

(7 -t 
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A direction on the sky (0, 0) is drawn from a distribution p{9, 0), usually representing 
a uniform distribution on the sky: 

Pi0,^)^p{e)p{<p) (7) 

sine 0<^<'^lf^ -7r<0<7r1 
otherwise J [ otherwise J ' 

The direction {6, (p) allows the computation of the linear response of the network 
to strain, a matrix F(e, (p) including antenna patterns and time-delays. Finally, 
background noise e is drawn from the distribution p(e|ifsignai) = p(x|-ffnoiso)- 

The sample of p(x|_ffsignai) is then constructed as by adding the strain, 
transformed by the response, and scaled by the amplitude, to the background noise: 

X = crF • h + e (9) 

or, expressed as a distribution, drawing x from 

p(x|h, cr, 6*, </), e) =(5(x-crF-h-e) (10) 

As we know the distributions of all the components, we can compute an explicit 
expression for the iJsignai distribution 

p(x|i7signai) = / p(x|h, CT, 0, 0, e)p(h)p(cr)p(6', 0)p(e)dhdCTded(/)de. 

(11) 
This integral explores all possible combinations of parameters, weights them by their 
joint probability of occurring, and then assigns that probability to the observation they 
will result in. The resulting expression for the distribution is unwieldy, but samples 
can be readily drawn from it by following the procedure above. (The signal model 
so described, with a handful of waveforms at a handful of fixed distances, is far from 
physical.) 

The detection probability pjj (the fraction of realizations of the signal hypothesis 
correctly classified as signals) can be evaluated like the false alarm probability: 




PD^ p(x|iJsignal)dx (12) 

J otherlfsr" } (M-l^-.n-)dx) (13) 

1 if Xi G i?signal 1 \ /^^x 

otherwise J / 

where the observations x^ are in this case drawn from the signal distribution 
p(x|_ffsignai). Again, the detection fraction computed depends on the signal 
distribution chosen; it is not an intrinsic property of the classification rule. 

The false alarm and detection probabilities tell us something about the 
performance of the classification rule. Small false alarm probabilities and large 
detection probabilities are desirable; they mean the classification rule will make few 
mistakes, but this quality is contingent on the signal and noise hypotheses accurately 
representing the physical system we want to apply the rule to. In gravitational wave 
detection, we wish to set the false alarm probability to some acceptable level, such as 
one false alarm in one hundred years of observation time. For this fixed false alarm 
probability, we want to choose the classification rule that maximizes the detection 
probability; this is the Neyman- Pearson criterion we explore in the next section. 
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2.2. The optimal statistic 

One way to form a classification rule is to compute a function (or statistic) of the 
observation /(x), and compare it against a threshold A. Then 

X e i?sig„al ^ /(x) > A (15) 

X e i?„oisc ^ /(x) < A (16) 

An example statistic is the (dimensionless quantity proportional to) energy, /(x) — 

T 
x X. 

If we vary the threshold A we can produce a family of decision rules, and the 
detection pjj and false alarm pp probabilities become functions of A, pdW and pp{X). 
The two-dimensional curve parameterized by A, (pi?(A),p£)(A)) is called the receiver 
operating characteristic. An approximation to the receiver operating characteristic 
curve can be computed as cheaply as a single Monte-Carlo evaluation for pp and 
Pd, by recording /(x) for each realization rather than only noting if it exceeds A. 
Approximations to pp{X) and pp{X) can then be cheaply computed by counting the 
fraction of the precomputed values above the threshold, but the estimates so obtained 
are not independent for different values of A. 

The frequcntist Neyman-Pcarson criterion [13] says that we should choose our 
classification rule to maximize the detection probability pjj for a given false alarm 
probability pp. The maximization is over the space of all possible classification rules. 
There exists an explicit solution to this optimization problem given by the Neyman- 
Pearson lemma: 

A(x) = pM^^^ > A (17) 

p(x|iynoiso) 

for A yielding the desired pp{\). A is called the likelihood ratio. It is important to note 
that this optimality is for the particular signal model adopted for the Monte-Carlo 
simulation. It is not a claim for all signal models. 

It is easy to see why this choice is optimal. Break R" into infinitesimal pieces 
dx. We want to incrementally construct i?signai from these pieces. The best 
piece, according to the Neyman-Pearson criterion, brings the most signal realizations 
for a given number of noise realizations. This is the piece with the maximal 
(p(x|_ffsignai)dx)/(p(x|_ffnoisc)dx) = A(x). The ucxt bcst piece has the next highest 
A(x) and so on. We take pieces in this fashion until pp for their union reaches the 
desired value, which is the same as taking all pieces with A(x) > A. 

The claim that A(x) is the optimal statistic may appear to be in conflict with the 
non-existence of a uniformly most powerful test in this situation [13| . The concept of a 
uniformly most powerful test refers to parameterized hypotheses, and in this scenario it 
means that if we seek to decide between a specific signal -ffh.e.^.o- and i?noiso, the most 
powerful test (i.e. Rh.9,(j>,a-) will vary (not be uniform,) depending on which specific 
signal (h, 9, 0, a) we consider. This is not the case for the Monte-Carlo simulation, 
where we do not look for a specific signal, but instead for any signal from some specific 
distribution. The non-existence of the uniformly most powerful test is a special case 
of the observation that the optimal statistic depends on the signal model we adopt. 

2.3. Implicit and explicit signal models 

We can infer the optimal statistic for a given signal and noise model, but we 
can also work backwards from an arbitrary statistic to infer the signal and noise 
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distributions it is the optimal statistic for. In general, the identification cannot be 
made uniquely; arbitrary strictly increasing functions of one statistic yield another 
statistic with identical (though differently parameterized) classification regions; nor 
can the likelihood ratio uniquely define its numerator and denominator distributions. 
However, when we have access to the derivation of the statistic, as in [H [H [5l [7l [H [9] , 
where the noise model is explicitly asserted and the relationship of the statistic to 
probability is known (typically the logarithm of a probability ratio), we can readily 
extract a "natural" member of the set of statistics that make the same classifications. 
In TT] this is done for several statistics used in the community and it is found that they 
implicitly correspond to unphysical or even improper signal distributions: a population 
of gravitational waves whose energies at Earth are infinite, infinitesimal or have some 
particular value and are anisotropically distributed across the sky. 

When a proposed statistic is tested against a Monte-Carlo simulation, the paper 
effectively proposes two physical models, one implicit in the statistic and one more 
explicit in the simulation. When these models differ (as they do in the papers 
cited above) the paper contradicts itself, and the test intended to demonstrate the 
statistics efficacy instead suggests a different statistic. This optimal statistic makes 
its gains by fully adapting to the simulation; if the simulation uses only a handful of 
waveforms and distances it can even outperform a matched filter bank (which searches 
all distances). We might view the optimal statistic as "cheating", but the fault lies 
with the unphysicality of the simulation; to even form the notion of cheating is to 
admit that the Monte-Carlo simulation we have used is not compatible even with our 
incomplete state of knowledge about the physical signal population. More importantly, 
testing with such a simulation does not conclusively demonstrate that the proposed 
statistic is itself not cheating in some fashion. An unphysical Monte-Carlo simulation 
can give us no assurances about a "black box" statistic. 

A better path is simply to explicitly propose a physically credible model and then 
derive the optimal statistic for it. 

2.4- Bayesian interpretation 

The Bayesian ^4j solution to the classification problem is 

EiEp^ > 1 (18) 

where the posterior ("after" we consider the observation) plausibility ratio is defined 
as the product of the Bayes factor and the prior ( "before" we consider the observation) 
plausibility ratio: 

P(gsignal|x) ^ P(x|gsignal)p(gsignal) , . 

P(^noiso|x) p(x|iJ„oiso) p{Hnoisc) 

The Bayes factor is the ratio of how well each hypothesis predicted the observation 
x; it tells us how much we should change our initial opinion, represented by the 
prior plausibility ratio, into our final opinion, represented by the posterior plausibility 
ratio. It is computed by forming marginalization integrals over nuisance parameters 
and their prior distributions 

p(x|gsignai) _ / p(x|g, 9, (p, h, e, Hsignai)p{cr)p{d, (f>)p{h)p{e)da d9 dcf) dh de 

p(x|iJnoise) p(x|iJnoiso) 

(20) 
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The mathematics involved is identical to the Monte-Carlo simulation. The Monte- 
Carlo simulation's parameter probability distributions are the Bayesian analyses' 
prior plausibility distributions. The Monte-Carlo procedure of sampling them is the 
same as Bayesian marginalization over them. The Bayes factor is the likelihood 
ratio. The likelihood threshold A is the inverse of the Bayesian prior plausibility 
ratio p{Hsigna.i) / p{H„_oiBc) ■ The Bayesian solution to the classification problem is the 
Neyman-Pearson optimal statistic for the Monte-Carlo simulation. 

The close relationship between Monte-Carlo simulations and Bayesian analysis is 
simply explained. Both assert that, despite incomplete knowledge of the physical 
signal model, we can learn something by proceeding with a signal model that is 
consistent with what we do know (and appropriately non-committal with respect to 
what we don't know). 

The notion of testing the Bayesian statistic with a Monte-Carlo simulation clarifies 
the inconsistency implicit in Monte-Carlo testing ad hoc statistics. If we test the 
Bayesian statistic with a different signal model, we must mean that we believe that 
signal model to be more plausible than the one used to derive the Bayesian statistic, 
in which case we should have derived the Bayesian statistic from it in the first place. 
(Testing with different models lets us measure relative "robustness", but the proper 
way to be robust is to use a signal model which is the appropriately weighted average 
of all the different signal models we hope to be robust over.) Of course, performing 
a Monte-Carlo simulation can still be valuable, allowing us to evaluate the receiver- 
operating characteristic, validate the implementation, or compare performance to ad 
hoc statistics, but by definition it will not shed further light on if the optimal statistic 
is or is not better than other statistics. 

2.5. Limitations 

None of the above suggests that the statistics in [H [2 O [71 [H [9] do not work, or even 
that they do not work well, but only that they do not work as well as is possible. 
Fortunately, the simplicity of the signal models employed in these searches means 
that they are likely to be sufficiently sampled by the limited Monte-Carlo simulations 
they are tested with (i.e. unlikely to "cheat"). The main drawback is the inability to 
produce a single-number event rate for a disparate and properly spatially distributed 
source population, a limitation that is not seen as major given the community's focus 
on computing detection ranges for given source energies. 

Practically speaking, it is only possible to derive an optimal statistic when the 
signal and noise models are known. In the case of the real instruments, the noise model 
is not fully known, and any Monte-Carlo simulation that draws on the real instrument 
therefore has an unknown optimal statistic. The best we can do in these cases is to 
form a noise model that includes all our knowledge about the noise, exactly as we 
do for the signal model. Even when all distributions are known, the optimal statistic 
may be impractical to compute, especially when our signal and/or noise distributions 
are richly informative. In either of these two scenarios, it is plausible that an ad hoc 
statistic can be more practical. 

However, we are not necessarily in that regime. The statistics proposed in 
[U [21 El [II [3 [9] are special cases of a Bayesian model proposed in [TI] which can be 
implemented at a comparable computational cost while using a more physical signal 
model. [lOj explicitly considers noise bursts and attempts to reject them by computing 
an "incoherent energy" statistic, but the motivating "bursty" noise model can also be 



Monte- Carlo and Bayesian techniques in gravitational wave burst data analysis 8 
efficiently implemented in a Bayesian statistic, as outlined in [11] . 

3. Conclusion 

The conceptual simplicity of Monte-Carlo simulations have made them a popular tool 
to characterize the performance of gravitational wave burst searches. They require the 
explicit adoption of a signal model in much the same fashion as a Bayesian analysis. 
Ad hoc proposals for statistics have not been optimal for the simulations they have 
been evaluated with; insofar as we trust these simulations to evaluate performance, we 
should consider, when practicable, going directly to the optimal statistics they define in 
both the frequentist and Bayesian paradigms. This trust can be better earned by using 
more physically plausible Monte-Carlo simulations: a physical continuous distribution 
of distances, orientations and inclinations instead of a handful of fixed distances, and 
a continuously parameterized and physically motivated model for waveforms instead 
of a handful of injection templates. 
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